20 research outputs found
MEMO: Coverage-guided Model Generation For Deep Learning Library Testing
Recent deep learning (DL) applications are mostly built on top of DL
libraries. The quality assurance of these libraries is critical to the
dependable deployment of DL applications. A few techniques have thereby been
proposed to test DL libraries by generating DL models as test inputs. Then
these techniques feed those DL models to DL libraries for making inferences, in
order to exercise DL libraries modules related to a DL model's execution.
However, the test effectiveness of these techniques is constrained by the
diversity of generated DL models. Our investigation finds that these techniques
can cover at most 11.7% of layer pairs (i.e., call sequence between two layer
APIs) and 55.8% of layer parameters (e.g., "padding" in Conv2D). As a result,
we find that many bugs arising from specific layer pairs and parameters can be
missed by existing techniques.
In view of the limitations of existing DL library testing techniques, we
propose MEMO to efficiently generate diverse DL models by exploring layer
types, layer pairs, and layer parameters. MEMO: (1) designs an initial model
reduction technique to boost test efficiency without compromising model
diversity; and (2) designs a set of mutation operators for a customized Markov
Chain Monte Carlo (MCMC) algorithm to explore new layer types, layer pairs, and
layer parameters. We evaluate MEMO on seven popular DL libraries, including
four for model execution (TensorFlow, PyTorch and MXNet, and ONNX) and three
for model conversions (Keras-MXNet, TF2ONNX, ONNX2PyTorch). The evaluation
result shows that MEMO outperforms recent works by covering 10.3% more layer
pairs, 15.3% more layer parameters, and 2.3% library branches. Moreover, MEMO
detects 29 new bugs in the latest version of DL libraries, with 17 of them
confirmed by DL library developers, and 5 of those confirmed bugs have been
fixed.Comment: 11 pages, 8 figure
Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting
Automatically detecting software failures is an important task and a
longstanding challenge. It requires finding failure-inducing test cases whose
test input can trigger the software's fault, and constructing an automated
oracle to detect the software's incorrect behaviors. Recent advancement of
large language models (LLMs) motivates us to study how far this challenge can
be addressed by ChatGPT, a state-of-the-art LLM. Unfortunately, our study shows
that ChatGPT has a low probability (28.8%) of finding correct failure-inducing
test cases for buggy programs. A possible reason is that finding
failure-inducing test cases requires analyzing the subtle code differences
between a buggy program and its correct version. When these two versions have
similar syntax, ChatGPT is weak at recognizing subtle code differences. Our
insight is that ChatGPT's performance can be substantially enhanced when
ChatGPT is guided to focus on the subtle code difference. We have an
interesting observation that ChatGPT is effective in inferring the intended
behaviors of a buggy program. The intended behavior can be leveraged to
synthesize programs, in order to make the subtle code difference between a
buggy program and its correct version (i.e., the synthesized program) explicit.
Driven by this observation, we propose a novel approach that synergistically
combines ChatGPT and differential testing to find failure-inducing test cases.
We evaluate our approach on Quixbugs (a benchmark of buggy programs), and
compare it with state-of-the-art baselines, including direct use of ChatGPT and
Pynguin. The experimental result shows that our approach has a much higher
probability (77.8%) of finding correct failure-inducing test cases, 2.7X as the
best baseline.Comment: Accepted to the 38th IEEE/ACM International Conference on Automated
Software Engineering (ASE 2023
Differential Spectral Normalization (DSN) for PDE Discovery
Partial differential equations (PDEs) play a prominent role in many disciplines for describing the governing systems of interest. Traditionally, PDEs are derived based on first principles. In the era of big data, the needs of uncovering PDEs from massive data-set are emerging and become essential. One of the latest advance in PDE discovery models is PDE-Net, which has shown promising predictive power with its moment-constrained convolutional filters, but may suffer from noisy data and numerical instability intrinsic in numerical differentiation. We propose a novel and robust regularization method tailored for moment-constrained convolutional filters, namely, Differential Spectral Normalization (DSN), to allow accurate estimation of coefficient functions and stable prediction of dynamics in a long time horizon. We investigated the effectiveness of DSN against batch normalization, dropout, spectral normalization, weight decay, weight normalization, jacobian regularization and orthonormal regularization and supported with empirical evidence that DSN owns the highest effectiveness by learning the convolutional filters in a robust manner. Numerical experiments further reveal that with DSN there is a substantial potential to uncover the hidden PDEs in a scarce data setting and predict the dynamical behavior for a long time horizon, even in a noisy environment where all data samples are contaminated with noise
Diagnostic extended usefulness of RMI: comparison of four risk of malignancy index in preoperative differentiation of borderline ovarian tumors and benign ovarian tumors
Abstract Background This study aimed to examine the performance of the four risk of malignancy index (RMI) in discriminating borderline ovarian tumors (BOTs) and benign ovarian masses in daily clinical practice. Methods A total of 162 women with BOTs and 379 women with benign ovarian tumors diagnosed at the Second Affiliated Hospital of Harbin Medical University from January 2012 to December 2016 were enrolled in this retrospective study. Also, we classified these patients into serous borderline ovarian tumor (SBOT) and mucinous borderline ovarian tumor (MBOT) subgroup. Preoperative ultrasound findings, cancer antigen 125 (CA125) and menopausal status were reviewed. The area under the curve (AUC) of receiver operator characteristic curves (ROC) and performance indices of RMI I, RMI II, RMI III and RMI IV were calculated and compared for discrimination between benign ovarian tumors and BOTs. Results RMI I had the highest AUC (0.825, 95% CI: 0.790–0.856) among the four RMIs in BOTs group. Similar results were found in SBOT (0.839, 95% CI: 0.804–0.871) and MBOT (0.791, 95% CI: 0.749–0.829) subgroups. RMI I had the highest specificity among the BOTs group (87.6, 95% CI: 83.9–90.7%), SBOT (87.6, 95% CI: 83.9–90.7%) and MBOT group (87.6, 95% CI: 83.9–90.7%). RMI II scored the highest overall in terms of sensitivity among the BOTs group (69.75, 95% CI: 62.1–76.7%), SBOT (74.34, 95% CI: 65.3–82.1%) and MBOT (59.18, 95% CI: 44.2–73.0%) group. Conclusion Compared to other RMIs, RMI I was the best-performed method for differentiation of BOTs from benign ovarian tumors. At the same time, RMI I also performed best in the discrimination SBOT from benign ovarian tumors
Multigene Profiling of Circulating Tumor Cells in Esophageal Squamous Cell Carcinoma Identifies Prognostic Cancer Driver Genes Associated with Epithelial-Mesenchymal-Transition Progression and Chemoresistance
We investigated the clinical significance of CTCs in cancer progression by detecting multiple cancer driver genes associated with epithelial-to-mesenchymal transition (EMT) at the transcript level. The 10-gene panel, comprising CCND1, ECT2, EpCAM, FSCN1, KRT5, KRT18, MET, TFRC, TWIST1, and VEGFC, was established for characterizing CTCs from mouse ESCC xenograft models and clinical ESCC peripheral blood (PB) samples. Correlations between gene expression in CTCs from PB samples (n = 77) and clinicopathological features in ESCC patients (n = 55) were examined. The presence of CTCs at baseline was significantly correlated with tumor size (p = 0.031). The CTC-high patients were significantly correlated with advanced cancer stages (p = 0.013) and distant metastasis (p = 0.029). High mRNA levels of TWIST1 (Hazard Ratio (HR) = 5.44, p = 0.007), VEGFC (HR = 6.67, p TFRC (HR = 2.63, p = 0.034), and EpCAM (HR = 2.53, p = 0.041) at baseline were significantly associated with a shorter overall survival (OS) in ESCC patients. This study also revealed that TWIST1 facilitates EMT and enhances malignant potential by promoting tumor migration, invasion, and cisplatin chemoresistance through the TWIST1-TGFBI-ZEB1 axis in ESCC, highlighting the prognostic and therapeutic potential of TWIST1 in clinical ESCC treatment
Evaluating the environmental impact of contaminated sediment column stabilized by deep cement mixing
Circulating Tumor Cell Enumeration for Serial Monitoring of Treatment Outcomes for Locally Advanced Esophageal Squamous Cell Carcinoma
We aim to reveal the clinical significance and potential usefulness of dynamic monitoring of CTCs to track therapeutic responses and improve survival for advanced ESCC patients. Peripheral blood (PB) (n = 389) and azygos vein blood (AVB) (n = 13) samplings were recruited prospectively from 88 ESCC patients undergoing curative surgery from 2017 to 2022. Longitudinal CTC enumeration was performed with epithelial (EpCAM/pan-cytokeratins/MUC1) and mesenchymal (vimentin) markers at 12 serial timepoints at any of the pre-treatment, all of the post-treatments/pre-surgery, post-surgery follow-ups for 3-year, and relapse. Longitudinal real-time CTC analysis in PB and AVB suggests more CTCs are released early at pre-surgery and 3-month post-surgery into the circulation from the CTRT group compared to the up-front surgery group. High CTC levels at pre-treatments, 1-/3-month post-surgery, unfavorable changes of CTC levels between all post-treatment/pre-surgery and 1-month or 3-month post-surgery (Hazard Ratio (HR) = 6.662, p < 0.001), were independent prognosticators for curative treatment. The unfavorable pre-surgery CTC status was independent prognostic and predictive for neoadjuvant treatment efficacy (HR = 3.652, p = 0.035). The aggressive CTC clusters were more frequently observed in AVB compared to PB. Its role as an independent prognosticator with relapse was first reported in ESCC (HR = 2.539, p = 0.068). CTC clusters and longitudinal CTC monitoring provide useful prognostic information and potential predictive biomarkers to help guide clinicians in improving disease management
Circulating Tumor Cell Enumeration for Serial Monitoring of Treatment Outcomes for Locally Advanced Esophageal Squamous Cell Carcinoma
We aim to reveal the clinical significance and potential usefulness of dynamic monitoring of CTCs to track therapeutic responses and improve survival for advanced ESCC patients. Peripheral blood (PB) (n = 389) and azygos vein blood (AVB) (n = 13) samplings were recruited prospectively from 88 ESCC patients undergoing curative surgery from 2017 to 2022. Longitudinal CTC enumeration was performed with epithelial (EpCAM/pan-cytokeratins/MUC1) and mesenchymal (vimentin) markers at 12 serial timepoints at any of the pre-treatment, all of the post-treatments/pre-surgery, post-surgery follow-ups for 3-year, and relapse. Longitudinal real-time CTC analysis in PB and AVB suggests more CTCs are released early at pre-surgery and 3-month post-surgery into the circulation from the CTRT group compared to the up-front surgery group. High CTC levels at pre-treatments, 1-/3-month post-surgery, unfavorable changes of CTC levels between all post-treatment/pre-surgery and 1-month or 3-month post-surgery (Hazard Ratio (HR) = 6.662, p p = 0.035). The aggressive CTC clusters were more frequently observed in AVB compared to PB. Its role as an independent prognosticator with relapse was first reported in ESCC (HR = 2.539, p = 0.068). CTC clusters and longitudinal CTC monitoring provide useful prognostic information and potential predictive biomarkers to help guide clinicians in improving disease management